11.2 Markov Chains
127
make from a long, unbroken observation. The problem becomes particularly acute
when evidence for higher order Markov chains is sought, when the quantity of data
required might be unattainable. 8 An important result is Whittle’s formula giving the
distribution of the transition count:
N (n)
uv (F) = F∗
uv ,
(11.9)
whereupper N Subscript u v Superscript left parenthesis n right parenthesis Baseline left parenthesis upper F right parenthesisN (n)
uv (F) is the number of sequencesleft parenthesis a 1 comma a 2 comma ellipsis comma a Subscript n plus 1 Baseline right parenthesis(a1, a2, . . . , an+1) having transition count
upper F equals StartSet f Subscript i j Baseline EndSetF = {fij}, and satisfyinga 1 equals ua1 = u anda Subscript n plus 1 Baseline equals van+1 = v. The transition count together with the
initial state (with probability p Subscript a 1pa1) forms a sufficient statistic for the process, since
pa1pa1a2 . . . panan+1 = pa1
Π
ij
p
fij
ij ,
(11.10)
where the left-hand side is simply the probability of realizing a particular sequence
StartSet x 1 comma x 2 comma ellipsis comma x Subscript n plus 1 Baseline EndSet{x1, x2, . . . , xn+1}. For i comma j equals 1 comma ellipsis comma si, j = 1, . . . , s, f Subscript i jfij is the number of mm, with 1 less than or equals m less than or equals n1 ≤m ≤n, for
whicha Subscript m Baseline equals iam = i anda Subscript m plus 1 Baseline equals jam+1 = j;upper FF is therefore ans times ss × s matrix, such thatsigma summation Underscript i j Endscripts f Subscript i j Baseline equals nΣ
ij fij = n and
such thatf Subscript i dot Baseline minus f Subscript dot i Baseline equals delta Subscript i u Baseline minus delta Subscript i v Baseline comma i equals 1 comma ellipsis comma sfi· −f·i = δiu −δiv, i = 1, . . . , s, for some pairu comma vu, v, wheref Subscript i dot Baseline equals sigma summation Underscript j Endscripts f Subscript i jfi· = Σ
j fij, and
StartSet f Subscript i dot Baseline EndSet{fi·} andStartSet f Subscript dot j Baseline EndSet{f·j} are the frequency counts ofStartSet a 1 comma ellipsis comma a Subscript n Baseline EndSet{a1, . . . , an} andStartSet a 2 comma ellipsis comma a Subscript n plus 1 Baseline EndSet{a2, . . . , an+1}, respectively,
from which f Subscript i dot Baseline minus f Subscript dot i Baseline equals delta Subscript i a 1 Baseline minus delta Subscript i a Sub Subscript n plus 1fi· −f·i = δia1 −δian+1. In Eq. (11.9), upper F Subscript u v Superscript asteriskF∗
uv is the left parenthesis v comma u right parenthesis(v, u)th cofactor of the
matrix upper F Superscript asterisk Baseline equals f Subscript i j Superscript asteriskF∗= f ∗
ij , with components
f ∗
ij =
{δij − fij/fi· if
fi· > 0
δij
if
fi· = 0 .
(11.11)
Problem. Prove that if upper PP is stochastic, then any power of upper PP is also stochastic.
The entropy of the transitions (i.e., the weighted variety of the transitions) can
be found from each row of the stochastic matrix according to Eq. (6.5). The (infor-
mational) entropy of the process as a whole is then the weighted average of these
entropies, the weighting being given by the equilibrium distribution of the states.
Hence, in a sense the entropy of a Markov process is an average of averages.
Problem. Consider the three-state Markov chain
right arrow→
1
2
3
1
0.1
0.9
0.0
2
0.5
0.0
0.5
3
0.3
0.3
0.4
and calculate (i) the equilibrium proportions of the states 1, 2, and 3 and (ii) the
average entropy of the entire process.
8 See Billingsley, especially for the proof of Whittle’s formula, Eq. (11.9).